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Abstract 


In  this  work  we  demonstrate  that  the  Mizuno-Todd-Ye  predictor- 
corrector  primal-dual  interior-point  method  for  linear  programming 
generates  iteration  sequences  that  converge  to  the  analytic  center  of 
the  solution  set. 


1  Introduction  and  Preliminaries 

The  basic  primal-dual  interior-point  method  for  linear  programming  was  orig¬ 
inally  proposed  by  Kojima.,  Mizuno,  and  Yoshise  [6]  based  on  earlier  work 
of  Megiddo  [11].  This  algorithm  can  be  viewed  as  perturbed  (centered)  and 
damped  Newton’s  method  applied  to  the  first  order  conditions  for  a  particu¬ 
lar  standard  form  linear  program.  They  established  linear  convergence  of  the 
duality  gap  sequence  to  zero  and  an  iteration  complexity  of  0(nL)  for  their 
basic  algorithm.  Immediately  Kojima,  Mizuno,  and  Yoshise  in  a  second  paper 
[7],  and  Monteiro  and  Adler  [15]  proposed  algorithms  that  fit  in  the  original 
Kojima- Mizuno- Yoshise  framework  and  established  linear  convergence  of  the 
duality  gap  sequence  to  zero  and  a  superior  iteration  complexity  of  0(y/nL) 
for  their  versions  of  the  algorithm.  Soon  after  Mizuno,  Todd  and  Ye  [14] 
considered  a  predictor-corrector  variant  of  the  Kojima-Mizuno- Yoshise  ba¬ 
sic  algorithm.  In  their  algorithm  the  predictor  step  is  a  damped  Newton 
step  and  the  corrector  step  is  a  perturbed  (centered)  Newton  step.  Mizuno, 
Todd,  and  Ye  also  established  linear  convergence  of  the  duality  gap  sequence 
to  zero  and  an  iteration  complexity  of  0(y/nL)  for  their  predictor-corrector 
algorithm. 

The  literature  now  abounds  with  papers  concerned  with  issues  related  to 
primal-dual  interior-point  methods.  Moreover,  when  we  discuss  convergence 
or  convergence  attributes  (including  complexity)  of  one  of  these  algorithms 
we  are  in  general  discussing  convergence  of  the  duality  gap  to  zero.  This  in¬ 
terpretation  has  become  standard  in  the  area  even  though  convergence  of  the 
duality  gap  sequence  does  not  imply  convergence  of  the  iteration  sequence. 
The  convergence  of  the  iteration  sequence  is  certainly  an  important  issue  in 
its  own  right.  Indeed,  the  earlier  works  on  fast  (superlinear)  convergence 
of  the  duality  gap  sequence  to  zero,  i.e.,  Zhang,  Tapia,  and  Dennis  [26], 
Zhang,  Tapia  and  Potra.  [27],  Zhang  and  Tapia.  [23],  Ye,  Tapia.,  and  Zhang 
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[21],  and  McShane  [10],  all  made  the  assumption  that  the  iteration  sequence 
converged. 

In  some  applications,  e.g.  see  Charnes,  Cooper,  and  Thrall  [2],  it  is 
important  to  obtain  a  solution  that  is  not  near  the  boundary  ot  the  solution 
set.  Hence  there  is  significant  value  in  designing  a  primal-dual  interior-point 
method  for  linear  programming  that  converges  to  the  analytic  center  of  the 
solution  set. 

Tapia,  Zhang,  and  Ye  [17]  derived  conditions  under  which  the  iteration 
sequence  generated  by  the  Kojima-Mizuno-Yoshise  primal-dual  interior-point 
method  converged.  These  conditions  were  essentially  the  conditions  lor  fast 
(superlinear)  convergence  established  by  Zhang,  Tapia,  and  Dennis  [26]  (see 
also  Zhang  and  Tapia  [24]).  Zhang  and  Tapia  [25]  derived  conditions  under 
which  this  iteration  sequence  converged  to  the  analytic  center,  assuming 
that  the  sequence  converged.  However,  these  conditions  are  not  completely 
compatible  with  the  Tapia- Zhang- Ye  conditions  for  the  convergence  of  the 
iteration  sequence. 

Ye,  Giiler,  Tapia,  and  Zhang  [20],  and  independently  Mehrotra  [13],  based 
on  the  work  of  Ye,  Tapia,  and  Zhang  [21],  demonstrated  that  the  Mizuno- 
Todd-Ye  predictor-corrector  algorithm  in  all  cases  gives  quadratic  conver¬ 
gence  of  the  duality  gap  sequence  to  zero.  A  highlight  ot  this  contribution 
was  that  the  assumption  of  iteration  sequence  convergence  was  not  needed 
(for  the  first  time).  Soon  after  Zhang  and  Tapia  [24]  removed  this  assumption 
from  the  Zhang- Tapia-Dennis  theory  for  superlinear  convergence.  Quite  re¬ 
cently  Zhang  and  El-Bakry  [22]  were  able  to  show  that  a  modified  version  of 
the  Mizuno-Todd-Ye  predictor- corrector  algorithm  had  the  property  that  the 
iteration  sequence  that  it  generated  converged  to  the  analytic  center.  Their 
modified  algorithm  dynamically  chose  the  steplengtli  in  the  Newton  predictor 
step  so  that  the  corrector  step  would  asymptotically  enforce  arbitrary  close 
proximity  to  the  central  path. 

In  this  paper  we  show  that  the  predictor-corrector  algorithm  as  originally 
stated  by  Mizuno,  Todd,  and  Ye  has  the  property  that  the  iteration  sequences 
(predictor-step  sequence  and  corrector-step  sequence)  it  generates  converge 
to  the  analytic  center  of  the  solution  set. 

The  paper  is  organized  a,s  follows.  In  the  remainder  of  this  section  we 
introduce  our  notation  and  several  fundamental  background  notions.  In  Sec¬ 
tion  2  we  discuss  the  primal-dual  Newton  step  and  establish  some  properties 
concerning  this  step.  Some  mathematical  tools  concerning  projections  and 
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scalings  are  derived  in  Section  3.  Central  path  issues  are  discussed  in  Section 
4.  The  Mizuno-Todd-Ye  predictor-corrector  algorithm  and  some  of  its  prop¬ 
erties  are  presented  in  Section  5.  In  Section  6  we  combine  all  our  previous 
discussion  and  in  Theorem  6.1  demonstrate  that  the  Mizuno-Todd-Ye  algo¬ 
rithm  generates  sequences  that  converge  to  the  analytic  center  of  the  solution 
set. 

Given  a  vector  x,  d,  <f>,  the.  corresponding  upper  case  symbol  denotes  (as 
usual)  the  diagonal  matrix  X,  D ,  $  defined  by  the  vector. 

We  denote  component-wise  operations  on  vectors  by  the  usual  notations 
for  real  numbers.  Thus,  given  two  vectors  u,v  of  the  same  dimension,  uv, 
u/v,  etc.  denotes  the  vectors  with  components  up;,,  u,/u,-,  etc.  This  notation 
is  consistent  as  long  as  component- wise  operations  are  given  precedence  over 
matrix  operations.  Note  that  uv  =  Uv  and  if  A  is  a  matrix,  then  Auv  =  AUv , 
but  in  general  Auv  ^  (Au)v. 

We  frequently  use  the  O(-)  and  O(-)  notation  to  express  a  relationship 
between  functions.  Our  most  common  usage  will  be  associated  with  a  se¬ 
quence  {xk}  of  vectors  and  a  sequence  {fik}  of  positive  real  numbers.  In  this 
case  x  =  0(/i),  or  x.k  =  0(/ik),  means  that  there  is  a  constant  K  (depen¬ 
dent  on  problem  data)  such  that  for  every  k  £  IN,  \\xk\\  <  Kfik .  Similarly, 
x  =  0(//,),  or  xk  =  £l(ftk),  means  that  there  is  e  >  0  such  that  for  every 
k  <E  IV,  ||xfe||  >  e/A 

The  primal  and  dual  linear  programming  problems  are: 


(LP) 


minimize  cTx 
subject  to  Ax  =  b 
x  >  0, 


and 


(LD) 


maximize  bTy 

subject  to  ATy  +  s  =  c 
s  >  0, 


where  c  €  JRr\  b  €  iRm,  A  €  ]RmXn.  We  assume  that  both  problems  have 
optimal  solutions,  and  that  the  sets  of  optimal  solutions  are  bounded.  This  is 
equivalent  to  the  requirement  tha,t  both  feasible  sets  contain  points  satisfying 
all  inequalities  strictly. 
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Given  any  feasible  primal-dual  pair  (x,  .s),  the  problems  can  be  rewritten 


as 

minimize 

~T 

S  X 

(LP) 

subject  to 

Ax 

=  b 

X 

IV 

J3 

and 

~T 

minimize 

X  s 

(LD) 

subject  to 

Bs 

=  Be 

s  >  0, 


where  BT  is  a  matrix  whose  columns  span  the  null  space  of  A.  Popular- 
choices  for  BT  are  an  orthonormal  basis  for  the  null  space  of  A  and  B  =  Pa, 
the  projection  matrix  into  the  null  space  of  A. 

The  feasible  sets  for  (LP)  and  (LD)  will  be  denoted  respectively  by  V 
and  T>.  Their  relative  interiors  will  be  respectively  V°  and  T>° . 

The  set  of  optimal  solutions  for  the  primal-dual  pair  of  problems  con¬ 
stitutes  a  face  F  —  Fp  x  Fp  of  the  polyhedron  of  feasible  solutions,  where 
Fp  and  Fp  are  respectively  the  primal  and  dual  optimal  faces.  By  hypoth¬ 
esis,  this  face  is  a.  compact  set.  It  is  well  known  that  this  face  is  char¬ 
acterized  by  a  partition  {B,N)  of  the  set  of  indices  n}  such  that 

Fp  =  {.x  6  V  |  x n  =  0}  and  Fp  =  {.s  €  T>  \  sp  =  0}.  In  the  relative  interior 
of  the  face,  x p  >  0  and  sn  >  0. 

We  study  algorithms  that  generate  sequences  that  converge  to  the  optimal 
face.  Our  main  concern  is  with  the  behaviour  of  the  iterates  a,s  they  approach 
the  optimal  face.  We  want  this  to  happen  in  such  a  manner  that  all  limit 
points  are  in  the  relative  interior  of  the  optimal  face.  We  shall  see  later  on 
how  this  condition  can  be  enforced. 

Given  //.  >  0,  //,  G  M,  the  pair  (x ,s)  of  feasible  primal  and  dual  solutions 
is  the  central  point  (x(fi),  s(/i))  associated  with  fi  if  and  only  if 


x  s  =  fie, 


where  e  stands  for  the  vector  of  all  ones,  with  dimension  given  by  the  context. 

The  central  path  is  the  curve  in  IP2"'  parametrized  by  the  positive  real  //,, 
i.e., 

/<  (x(/i) ,  s(fi) ) . 
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Thus  (x,  s)  is  a  central  point  if  and  only  if 


x  s  = 

Ax  = 

Bs  = 
x,s  > 

where  the  columns  of  BT  span  the  null  space  of  A. 

The  first-order  or  Karush-Kuhn-Tucker  (KKT)  conditions  for  problem 
(LP)  (or  (LD))  are 

xs  =  0 
Ax  =  b 
ATy  +  s  =  c 
x ,  .s  >  0. 

The  perturbed  KKT  conditions,  for  perturbation  parameter  //  >  0,  are 


fie 

b 

Be 

0, 


(1) 


xs  =  fie 

Ax  =  b 

ATy  +  .s  =  c 

x,s  >0. 


(2) 


Observe  that  the  perturbed  KKT  conditions  are  merely  the  defining  re¬ 
lations  for  the  central  path  and  (2)  can  equivalently  be  written  as  (1).  Es¬ 
sentially  all  primal-dual  interior-point  methods  for  problem  (LP)  consist  of 
some  variant  of  the  damped  Newton  method  applied  to  the  perturbed  KKT 
conditions  (1)  or  (2). 


2  Newton  Steps 

When  dealing  with  an  iterative  procedure  we  will  use  the  superscript  0  to 
denote  the  previous  iterate,  no  superscript  to  denote  the  current  iterate,  a 
subscript  of  +  to  denote  the  subsequent  iterate.  In  two-step  algorithms  like 
the  Mizuno-Todd-Ye  algorithm  described  in  Section  4  this  notation  will  apply 
to  the  current  iterate,  the  intermediate  iterate,  and  the  final  iterate. 

Given  a  strictly  feasible  pair  (x,.s),  we  shall  define  three  parameters: 

fi(x,s)  =  sTx/n , 
w(x,s)  =  sx/fl(x,  .s), 

<f)(x,s)  =  l/y/w(x,s). 
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The  first  two  parameters  will  be  extensively  studied  below.  The  parameter 
<t>  has  no  special  meaning,  and  is  introduced  because  it  will  simplify  many 
formulas  in  the  text.  When  no  confusion  can  arise,  we  drop  the  reference  to 
the  variables,  and  continue  to  use  other  symbols  in  a  consistent  manner.  For 
example  w  =  w(x,s)  or  (fP  =  4>(x°,s°). 

Given  a  strictly  feasible  pair  (x,  s),  we  are  interested  in  finding  (x+,  s+)  = 
(x,s)  +  (u,v)  that  solves  (1)  or  (2)  with  /i  =  7 p(x,s),  where  7  G  [0, 1].  The 
Newton  equation  for  (1)  at  (x,s)  with  //,  replaced  by  7//,  can  be  written 

xv  +  su  =  —x  s  +  7/t(x,  s)e 

u  e  Af(A)  (3) 

v  e  n{AT). 

where  as  usual  Af  denotes  null  space  and  1Z  denotes  range  space.  The  solution 
of  (3)  is  obtained  by  scaling  the  equations.  Define  the  scaling  matrix  by 
d  =  yjx/ is,  D  =  diag(d1, . . . ,  dn),  and  the  scaling 

{P,  <l)  ip,  <1)  =  dq) 


for  general  (p,q)  €  ( TRn  x  1RTI). 

The  relationship  between  d  and  the  vector  <j>  defined  above  is 


X(f)  _  y/Jj 
sfp  s<f>  ' 


(4) 


When  applied  to  the  original  pair  (.r:,.s),  the  resulting  scaled  pair  will  be 

(,T,.S)  =  {y/xs,^/xs).  (5) 

After  scaling,  the  system  (3)  becomes 


xv  +  su  —  —x  s  +  7  fie 

u  €  M{AD)  (6) 

v  g  K{DAt). 

Since  x  >  0,  the  first  equation  can  be  multiplied  by  .t-1,  leading  to 

v  +  u  =  —  s  +  7/i.f-1, 
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and  the  solution  is  simply  the  orthogonal  decomposition  of  the  vector  —s  + 


7 fix  *  along  A f  (AD)  and  its  orthogonal  complement.  Let  PA£>  be  the  pro¬ 


- — 1 


jection  matrix  into  Af(AD),  and  Pad  =  /  —  Pad- 

u  =  Pad(-s  -f  7 fix ~ 1 ) 
v  =  Pad(~s  +  nffix-1). 


(7) 


d  lv. 


The  Newton  step  in  original  coordinates  is  given  by  u  =  du  and  v 

A  convenient  formulation  is  obtained  by  substituting  d  —  and  d~l 

S(f>. 


l 


u 


V 


x4>Paxa>4> 


s 


(frPAXQd1 


(8) 


We  now  describe  two  alternative  ways  of  writing  the  expression  for  u  (the 
expressions  for  v  are  similar). 

Using  the  definition  of  w, 


u  =  -x.(j)PAx<S!<f){uJ  -  7e),  (9) 

Observing  the  symmetrical  formulation  of  (LD),  we  see  that  for  any  two 
feasible  dual  slacks  sx,s2,  =  PADd.s2  =  PAod,c.  In  particular,  we  can 

choose  a  fixed  dual  slack  and  use  it  in  (7).  We  shall  choose  s*,  the  analytic 
center  of  the  dual  optimal  face,  and  write 

u  =  -dPADd(s*  -  7 fix'1). 


By  the  same  process  a.s  above, 


u  =  —x<t>PAX<b4> 


(10) 


In  Section  5  when  we  study  the  Mizuno,  Todd,  and  Ye  predictor-corrector 
algorithm,  we  will  have  need  for  the  following  proposition. 


Proposition  2.1  Let  (x,s)  and  (:c,.s)  be  feasible  pairs.  Consider  x+  =  x  +  u 
and  s+  =  s  +  v  where  (u,v)  satisfies 


xv  +  su  =  (1  —  -y)xvs  -)-  fie 

u  6  Af(A) 
v  £  TZ(At)  . 
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Then 


(11) 


H(x+,v+)  =  7//(.r,  s)  +  fi  . 
Proof.  Left  multiplying  by  eT ,  we  obtain 

xTv  +  sTu  =  —(1  —  7),x+.s  -+-  nfi  . 


From  the  definition 


x $+ 


X 


T  4-  T 

s  +  ,ttu  +  s  ? 


since  uTv  —  0.  But  xTv  =  xTv,  because  x  —  x  €  N(A)  and  u  G  PL(AT),  and 
similarly  =  sTu.  Substituting  in  the  expressions  above  we  immediately 
obtain  (11).  ■ 

Two  special  cases  of  problem  (3)  have  been  studied  extensively  in  the 
literature.  They  are 

(i)  7  =  0:  The  resulting  directions  ( h], ,  h] )  are  called  the  primal-dual  affine 
scaling  directions  (or  pure  Newton  directions). 

(ii)  7  =  1:  The  resulting  directions  ( hl,h2s )  are  called  the  constant  gap 
centering  directions. 

The  first  equation  of  the  Newton  system  (3)  can  be  rewritten  as 


xv  +  su  =  —(1  —  7)x\s  +  7  ( — xs  +  fie).  (12) 

This  is  a  combination  of  the  solutions  of  two  systems  with 

xn1  +  sv1  =  —xs 
xv 2  +  su2  =  —xs  +  /it:, 

where  //,  =  fi(x,s).  The  complete  solution  is  given  by 

(u,v)  =  (1  -  7)(u1,u1)  +  7 (u2,v2).  (14) 

It  is  quite  common  to  use  these  two  directions  separately,  possibly  as  a  way 
to  simplify  the  analysis.  This  is  done  by  the  predictor-corrector  algorithms 
that  we  study  in  this  paper. 


3  Mathematical  Tools 

In  this  section  we  state  some  lemmas  on  projections  and  scalings  that  will 
be  useful  in  the  analysis  below. 
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3.1  Properties  of  Scaled  Projections 

In  this  subsection  we  slightly  extend  results  published  by  Megiddo  and  Shub 

[12]- 

Consider  the  primal  feasible  set  for  (LP), 

V  =  {.x-  e  lRn  |  Ax  =  b,  x  >  0} 

and  the  map  h  defined  for  d  G  lRn,  d  >  0,  <1  ^  0,  and  p  G  JRn  by 

(d,p)  >-»  h(d,p)  =  PADp,  (15) 

where  Pad  represents  the  projection  matrix  into  the  null  space  of  AD. 

We  study  the  behaviour  of  this  map  when  d  >  0,  d  — >  d  and  p  —>  p,  where 
d  >  0,(1  ^  0,  and  p  G  1BU . 

Given  d,  we  define  the  index  sets  B  =  {i  =  1  |  J,  >  0}  and 

TV  =  {f  =  |  di  =  0}.  The  variables  with  indices  in  B  are  called 

the  large  variables,  and  the  others  small  variables.  It  is  difficult  to  describe 
the  behaviour  of  the  small  variables  h^(d,p)  of  the  scaled  projection  defined 
above;  the  theory  of  Megiddo  and  Shub  concerns  the  large  variables  hs{d,  p). 
We  shall  describe  these  results  conveniently  extended  to  fit  our  needs. 

By  definition  of  projection,  h(d,p)  solves  the  problem 


minimize  \\hN  -  pN\\2  +  \\hB  -  pB ||2 
subject  to  AsDshs  =  —Aj^D^h^. 

Assume  now  that  h^(d,p)  is  given.  Then  hB(d,p)  solves 

minimize  ||/<#  —  pB\\ 
subject  to  ABDBhB  =  —A^D^h^{d,  p). 


(16) 


(17) 


Thus,  since  hff(d,p)  is  finite  and  Dn  —  0,  hB(d,p )  =  PabDbPb-  We  shall 
study  the  point-to-set  mapping  0  defined  for  d  G  and  p  G  lRn  by 

d(d,  p)  t-»  6(d,  p)  =  {hB  G  |  ABDBhB  =  —A]vDNhN{d,  p)},  (18) 

near  a  pair  (d,  p)  G  x  fft"  and  <1  ^  0.  Note  that  at  this  point,  0(d,p)  = 
Af(ABDB). 

Lemma  3.1  The  point-to-set  map  defined  by  (18)  is  continuous  at  ( d,p )  G 
lRr)_  x  lRn  and  d±  0. 
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Proof. 

(i)  Upper  semi-continuity:  Consider  a  sequence  ( dk,pk )  — >  (J,  p)  and  hB 
such  that  AsDghg  =  —A^]DkNh^(dk^pk)  and  hB  converges  to  some  point 
hB ■  We  must  prove  that  ABDBhB  =  0. 

The  sequence  hj^(dk,pk)  is  bounded,  because  \\hjy(dk ,  pk)\\  <  ||/)fc||,  since 
h(dk,pk)  is  a  projection.  Hence  ABDBhB  — >  0  and  consequently  AbDbJib  = 

0,  completing  this  part  of  the  proof. 

(ii)  Lower  semi-continuity:  Consider  now  an  arbitrary  point  Jib  G  J\f(ABDB)- 
Given  an  arbitrary  sequence  ( dk,pk )  G  IRr\_  x  lRn  and  such  that  (dk ,  pk)  — ► 

(J,  p)  we  must  construct  hkB  such  that  ABDkBhkB  =  -AuDkNhN{dk,pk)  and 

hB  — >  Jib- 

Consider  (dk ,  pk )  G  IRf  X  En  and  (dk,  pk )  — >  (d,  p).  Since  dkB  —>  dB  >  0  we 
lose  no  generality  by  assuming  that  dB  >  0  for  all  k.  Define  hkN  =  h^(dk ,  pk). 
For  each  k  let  hB  be  a  minimum-norm  solution  of  ABDBhB  =  — 
where  the  norm  is  the  weighted  Euclidean  norm  \\DB  ■  ||.  If  AB  denotes  the 
pseudo-inverse  of  AB ,  then  we  can  write  ~hB  =  -DkflABDkNhkN.  It  follows 
that  hB  — >  0,  since  dkB  ->dB  >  0  and  DkNhkN  —>  0.  Construct 

hkB  =  (DkB)-1DBhB  +  hkB.  (19) 


Then 

ABDkBhkB  =  AbDbJib  +  ABDkBhkB  =  - ANDkNhkN , 

since  hB  €  N{ABDB).  Thus  1ikB  €  0(dk,pk).  Since  DB  — >  DB  >  0  and 
hB  — >  0,  it  follows  that  hB  —>  Jib ,  completing  the  proof.  I 

Lemma  3.2  Let  h{d ,  p)  be  given  by  ( 15).  Consider  (<f  p)  6  IR'f  x  IBn .  d  0, 
and  ( dk,pk )  6  IRf  x  Ft"  such  that  ( dk,pk )  — >  (d.  p).  Then 

(i)  hBdk,pk )  ->  hB(d,  p)  =  PAbdPb- 

(ii)  If  pn  =  0,  then  hiv(dk,pk)  — >  0. 

Proof,  (i)  The  map  (d,p)  — >  argminj \\hB  -  pB\\  :  hB  G  9{d,p)}  is  well 
defined  by  the  uniqueness  of  the  minimizer.  It  is  continuous  at  (d,  p)  as  a 
consequence  of  the  continuity  of  the  point-to-set  map  9  and  the  continuity 
of  projections  (see  for  example  Hogan  [4]).  From  the  comment  immediately 
preceding  (17)  we  see  that 

hB{dk,pk)  =  argmin{||/?,B  -  pkB\\  :  hB  -  pkB\\  :  h.B  G  0(dk,pk)}  . 
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Hence  from  continuity  h,B(dk,pk )  — *  hB(d,p).  From  the  comment  immedi¬ 
ately  following  (17)  we  see  tlia.t  hB(d,p )  =  PabdbPb ■  This  establishes  part 

0)- 

(ii)  Here  we  follow  a  similar  proof  in  Megiddo  and  Shub  [12].  Assume 
that  pN  —  0  and  by  contradiction  that  for  some  sequence  dk  — >  d,  pk  — >  p  we 
have  hN(dk,pk)  — >  hN  ^  0.  Define  t  =  ||Aiv||2  >  0.  We  have: 

ii h(dk,Pk)  -  /ii2  =  ii hB(d*y)  -  pkB ir + HM^y)  -  /4ii2. 

By  (i),  hB(dk , /)  — >  hB,  where  hB  =  PabdbPb ■  For  sufficiently  large  k , 

II hB(dk,pk)  -  p|||2  >  \\hB  -  pB||2  -  e/2.  (20) 

Now  construct  the  following  sequence: 

hkB  =  {D%)-lDBhB  ,  hkN  =  0. 

It  follows  that  hkB  — >  hB ,  and  hk  G  AT(ADk),  since  ADkhk  =  ABDBhB  =  0. 

Comparing  this  with  (20),  we  have  for  A:  sufficiently  large  \\hk  —  pk\\  < 
|| h(dk ,  pk)  —  pk ||  and  hk  G  J\f(ADk),  contradicting  the  definition  of  /i(dfc,  /)  = 
P ADk  /  and  completing  the  proof.  ■ 


3.2  Shifted  Scalings 

This  subsection  contains  some  useful  consequences  of  scalings  on  projections 
and  norms.  The  first  lemma  concerns  projections  and  slightly  shifted  scalings. 

Lemma  3.3  Let  q  G  IPU  be  such  that  1 1  r/  —  e  1 1  oo  —  a>  '"’here  a  G  (0,0.25), 
and  consider  the  projections  h  =  PAp,  h  =  qPAQqp.  Then  \\h  —  h\\  <  3a||/|. 

Proof.  Note  that  since  p  ~  h  +  Arw  for  some  w  G  1R"\ 

qp  =  qh  +  (AQ)tw 


and  thus 

It  follows  that 


Paq<1P  =  Paq<i1i 
q~lh  =  PAQqh 
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On  the  other  hand,  by  definition  of  projection, 

qh  =  PAqqh  +  y, 

where  y  G  7 Z(QAT).  Merging  the  last  expressions,  we  get 

qh  =  q-'h  +  ;</, 

where  q_1h  €  AT(AQ)  and  y  G  1Z(QAT).  Subtracting  q~lh  G  Af(AQ)  from 
both  sides, 

(q~x  -  q)h  =  q~y(h  -  h)  +  y , 
and  from  the  orthogonality  of  the  right-hand  side  terms, 

~  Now  use  the  following  facts:  ||(/>,  —  h) ||  <  ||fy||oo|k/_1(^  —  ^)ll  anfl  ||('Z-1  — 
q)h ||  <  || («7— 1  —  rtlWlfcH.  Combining  these  three  expressions  leads  to 

Woollg^-glloort. 

But  Halloo  Ik/-1  —  </||oo  <  (1  +  a)  —  (1  —  a))  <  3a  which  is  easily  verified 
for  a  G  (0,0.25),  completing  the  proof.  I 

Our  second  lemma  concerns  scaled  norms.  Given  a  vector  x  G  JR++,  the 
following  map  defines  a  norm: 

h  €  JT  ~  ||fi||,  =  ||.r-1fi||. 

This  is  the  Euclidean  norm  of  the  vector  corresponding  to  li  after  a  scaling 
h  =  x~1h.  This  norm  is  very  usual  in  interior  point  methods,  because  it 
characterizes  the  proximity  from  a.  point  to  a  central  point  in  the  following 
sense:  let  x(//)  be  the  primal  central  point  associated  with  the  parameter 
/i  >  0.  If  ||:r  —  x(y)\\x  <  8  <  I  then  a,  Newton  centering  iteration  from  x 
produces  an  efficient  centering  step  (which  is  usually  imprecisely  stated  as 
being  in  the  region  of  quadratic  convergence  of  Newton’s  method). 

In  the  same  fashion  we  defined  the  scaled  Euclidean  norm  \\h\\x  we  define 
the  scaled  norm  ||/i||^°.  The  following  lemma,  relates  the  scaled  norms  for 
different  reference  points. 
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Lemma  3.4  Consider  x,  y  £  2R++;  h  £  JR”,  «  £  (0, 1).  If  either  ||x  —  < 

a  or  ||x  —  ?/||°°  <  a,  then 


Proof.  To  begin  with 


IWU  < 


1 

1  —  a 


h 

_ 

y 

h 

< 

y 

X 

X 

y 

X 

\\h\\y 


If  \\x  —  y||“  <  a,  then  |(x,-  —  't)i)/xi\  <  a,  or  1  —  yi/x.i  >  —a,  which  implies 
yi/xi  <  1+a  <  1/(1— a).  Intheotherca.se,  |(x;— xji )/yi |  <  a,  or  x ,•/?/,■  >  1— a, 
which  implies  <  1/(1  —  «),  completing  the  proof.  I 


4  Trajectories,  Centrality  and  Proximity 

The  primal-dual  central  path  defined  above  is  contained  in  the  set  of  interior 
points  and  ends  at  a,  point  (x*,.s*)  in  the  relative  interior  of  the  optimal 
face.  This  point  is  the  analytic  center  of  the  face.  See  problem  (24)  for  an 
equivalent  characterization.  For  more  detail  see  Mc.Linden  [9]  and  Sonnevand 
[!6]. 

In  this  section  we  study  (primal-dual)  proximity  criteria  that  describe 
how  far  a  pair  (x,  s)  is  from  the  primal-dual  central  path,  then  study  (primal) 
proximity  criteria  to  evaluate  how  far  a  point  in  the  optimal  face  is  from  its 
analytic  center. 


4.1  Primal-Dual  Proximity 

Given  an  interior  pair  ( x ,  s )  and  a.  parameter  //  >  0  (not  necessarily  equal  to 
//(x,  .s)),  the  proximity  ol  (x,  s)  in  relation  to  (x(//),  s(fi))  is  measured  by 


h(x,  s,  //,) 


(21) 


When  fi  =  fi(x,  .s),  this  is  the  proximity  with  relation  to  the  central  path, 


%,  s) 


//,(.T,  S) 


IMg-s)  - 


(22) 
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Let  us  compute  the  proximity  at  the  pair  (x+,s+)  resulting  from  the 
Newton  step  described  in  (3),  with  //,  =  /i(x,s).  We  have 

x+s+  =  (x  +  u)(.s  -f  v) 

=  XS  +  XV  +  SU  -f  UV 

=  7//,e  -f  uv. 

But  (i(x+,s+)  =  7/i  from  (??),  and  thus 


p(.x+,.s+) 

—  II 


uv 

—  6' 

1 

C-. 

J- 

+ 

UV 

uv 

7//. 

t'ix  +  ,S+) 

A  fundamental  result  on  the  effect  of  the  Newton  step  on  proximity  is  given 
in  the  following  lemma.  This  result  is  due  to  Mizuno,  Todd,  and  Ye  and  can 
be  found  in  [14]- 


Lemma  4.1  Consider  an  interior  pair  (x, s)  and  a  parameter  //+  >0.  If 
8(x,s,p,+)  =  8  <  0.5,  then  <i(.r+,  .s+)  <  82 / \/2. 

The  primal-dual  affine-scaling  directions  are  the  solution  of  (3)  with  7  = 
0.  These  directions  associated  with  each  interior  feasible  pair  (x,s)  generate 
a  continuous  vector  field,  which  extends  continuously  to  the  boundary. 

This  vector  held  was  thoroughly  studied  by  Adler  and  Monteiro  [1],  who 
describe  the  trajectories  generated  by  it  and  the  derivatives  of  these  trajec¬ 
tories.  The  trajectories  are  parameterized  by  //,,  and  there  is  one  trajectory 
passing  through  each  interior  pair  (x,.s). 

For  each  interior  pair  (x,.s),  we  defined  the  vector  r/;(x,s)  =  x s/p,(x,s). 
Each  trajectory  is  associated  with  this  vector  in  the  following  two  ways: 

(i)  The  trajectory  associated  with  w  >  0  is  composed  of  the  pairs  (x,s) 
such  that 

xs 

— - -  =  w. 

//,(x,  s) 

I11  particular,  the  central  path  is  the  trajectory  associated  with  w  —  e. 
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(ii)  The  trajectory  associated  with  w  >  0  is  composed  of  the  minimizer 
pairs  of  the  parameterized  primal-dual  penalized  function 

n  n 

xTs  ~  //  ^  Wi  In  Xi  —  //  ^  w>i  In  Si. 

i=l  2=1 

Each  trajectory  is  composed  of  interior  points,  and  ends  in  the  relative  inte¬ 
rior  of  the  optimal  face. 

In  what  follows,  we  assume  that  the  vectors  w(x,s)  are  always  in  a  com¬ 
pact  set  defined  by 

|H;r;,.s)  -  e||  <  a, 

where  a  G  (0, 1). 

When  the  weight  vectors  w  are  in  a  compact  set  bounded  away  from 
the  boundary  of  the  positive  orthant,  the  trajectories  end  in  the  relative 
interior  of  the  optimal  face.  Specifically  at  the  limit  of  the  minimizers  of  the 
parameterized  barrier  function,  we  have 

x*(w)  =  argmin  {—  ^  wt  In :c,-  j  x,  6  Fp} 

i&B 

s*(w)  =  argmin  { -  wt  In  .s,  |  x  G  FD}. 

ieN 

In  particular,  the  central  path  ends  at  the  analytic  center  of  the  optimal  face 
(®*,3*)  =  (®*(e),«*(e)); 

The  sets  of  end  points  of  all  trajectories  for  such  weights  w  are  sets  of 
minimizers  of  parameterized  continuously  differentiable  functions,  and  are 
compact.  It  is  easy  to  see  that  the  nonzero  variables  are  all  bounded  away 
from  zero,  because  the  compact  sets  are  in  the  relative  interior  of  the  optimal 
faces.  This  is  also  clear  from  the  fact  that  the  barrier  functions  become 
arbitrarily  large  as  the  boundaries  of  the  faces  are  approached. 

Similarly,  all  the  trajectories  in  the  bundle  associated  with  this  compact 
set  of  parameter  vectors  are  in  the  relative  interior  of  the  feasible  set,  and 
bounded  away  from  the  non-optimal  faces. 


4.2  Primal  Proximity 

We  shall  summarize  some  facts  about  the  analytic  center  of  a  polytope,  and 
derive  properties  of  descent  methods  for  finding  the  center. 
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Consider  the  primal  centering  problem 

minimize  p(x)  =  —  £"=1  ln(.x;) 

subject  to  Ax  =  b  (24) 

x  >  0, 

where  b  £  JRm,  A  £  JRmXn ,  such  that  its  feasible  region,  S° ,  is  nonempty, 

with  compact  closure  S.  The  analytic  center  of  S  is  the  unique  optimal 
solution  of  (24), 

y  =  argminp(x). 

:c€Sa 

The  analytic  center  was  defined  by  Sonnevend  [16];  see  also  McLinden  [9]. 
Its  properties  and  the  description  of  the  Newton  primal  centering  algorithm 
(SSD  algorithm)  are  described  in  Gonzaga  [3].  The  following  facts  come  from 
this  latter  reference. 

Given  a  point  x  £  S°,  the  Newton  centering  direction  from  x  is  given  by 
h(x)  =  xh(x),  where 

h,(x)  =  -Pax  a 

is  the  centering  direction  after  scaling  the  problem  so  that  the  point  x  is 
taken  to  e. 

The  (primal)  proximity  of  x  in  relation  to  y,  defined  above,  is  given  by 

S(x)  =  ||/j(.t)||  =  \\h(x)\\xt  (25) 

where  ||  •  ||i:  is  the  norm  relative  to  x . 

The  following  important  results  are  described  for  example  in  [3].  Let 
x  £  S°  be  such  that  S(x)  =  8  <  1,  then 

Ik  -  xlU  < 

6{x  +  h(x))  <  P. 

The  first  result  above  gives  an  upper  bound  for  ||x-  —  x H^-  We  shall  also  need 
a  lower  bound  lor  this  distance,  and  this  will  be  provided  by  the  next  lemma. 

Lemma  4.2  If  t>(x)  =  8  <  0.5,  then 

1-2  Sr 

II*  -  xlU  >  yz's  s’ 

In  particular,  if  6  <  0.09,  then  ||.x  —  x\\x  €  [0.95, 1 .15]. 
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Proof.  Let  x+  =  x  +  h(x).  We  know  that  ||/i(:c)||x  =  8,  and  that  8(x+)  < 
82.  It  follows  from  (26)  that 


'*+  -  xll«+  < 


82 


and  hence 


*+  -  x\\*  < 


x+ 


1  -82’ 

82 


X 


But  x+ / x  =  e  +  h(x)/x,  and  thus 


x+ 


<  i  + 


h(x) 


1  -82 


<1+8. 


It  follows  that 


Finally, 


*+  -  xll*  <  (1  +  6) 


1  -82  1  -S' 


k-xir  =  ii*  -  ,r+ + 1+ -  xiu 

>  Ik-nir-nn-xii, 

>  6-JL 

1  -  8 

I  -  28 

-o. 


1-8 


The  numeric  values  are  obtained  by  substitution,  completing  the  proof.  I 
This  lemma  shows  that  when  the  proximity  measure  is  small,  it  is  indeed  a 
good  approximation  to  the  actual  scaled  distance  to  the  center.  The  values 
8  <  0.09  will  be  quite  reasonable  for  our  analysis  below. 

One  final  technical  result  also  will  be  useful  below.  It  reproduces  the 
bounds  above  using  the  norm  relative  to  y. 

Lemma  4.3  If  8 (x)  =  8  <  0.1,  then  for  x+  =  x  +  h(x), 


Il^+-X||x  <  I-05*2 
Ik-Xllx  >  0.755. 
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Proof. 

Using  (26),  ||.x+  —  x|U+  <  £2/(l  —  h2),  since  S(x+)  <  82.  Using  Lemma 
3.4  with  a  =  <*)2/(l  —  A2),  we  obtain  ||.x+  —  y|jx  <  <*>2/(  1  —  2fi2).  The  first 
result  in  the  lemma,  follows  from  this  with  8  =  0.1. 

Using  Lemma  4.2,  ||x  —  x|U  >  <*>(1  —  2<^) /( 1  —  <*>).  From  (26),  ||.x  —  x\U  < 
1/(1  —  8).  Using  Lemma  3.4  with  a  =  1/(1  —  8),  we  get  ||.r  —  y||x  >  (1  — 
cn)\\x  —  x I U-- -  Manipulating  these  expressions,  we  arrive  at 

f 1  —  28\2 

I|X-*II*S  (iry)  4- 

Substituting  8  =  0.1,  we  obtain  the  second  result,  therefore  completing  the 
proof.  ■ 

The  primal  centering  direction  h(x)  is  the  Newton  direction  for  p(-)  from 
x,  and  it  coincides  with  the  steepest  descent  direction  for  x  =  e,  i.e.,  h(x)  is 
the  Cauchy  direction  from  e.  To  see  this  notice  that  h(x)  =  -xP^sVji(x)  = 
x  PAX  xx-1. 

Other  scalings  give  rise  to  descent  directions  that  are  in  general  not  as 
efficient  as  this  one.  We  shall  apply  Lemma  3.3  to  study  the  effect  of  slightly 
shifted  scalings  on  the  descent  directions. 

5  The  Mizuno-Todd-Ye  Algorithm 

The  MTY  algorithm  is  a.  path-following  predictor-corrector  algorithm.  All 
activity  is  restricted  to  a  region  near  the  central  path,  i.e.,  all  points  (x,s) 
generated  by  the  algorithm  satisfy 

S(:Gs)  =  IHU*)  -  e||  = 

where  a  £  (0,  0.5). 

Algorithm  5.1  Given  a  <  0.3,  (x"1 ,  .s°' )  such  that  8(xol,so1)  <  a2/y/2, 
k=l. 

REPEAT 

o  o^o  (A 
x  :=  x  ,  s  :=  s  . 


xs 


Kxj  s ) 


—  e 


<  a. 
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Predictor:  Given  (x°,  ,s°)  compute  the  (affine-scaling)  step  (u°,u°),  and 
let  x  —  x°  +  u°,  s  =  5°  +  v°  where  (u°,  v°)  is  defined  by 
x°v°  +  s°u°  =  -(1  -  7)x V,  «°  6  Af{A),  v°  G  ft(Ar'), 

with  7  G  [0,1)  such  that  (x,  .s)  is  feasible  and  S(x,  .s)  <  a.  (The 
specific  value  of  7  will  be  discussed  below). 

Corrector:  Given  (x,  s)  compute  the  (centering)  step  («,  u)  and  let  x+  — 
x  +  u,  s+  =  s  +  v,  where  (?/,,  v)  is  defined  by 
xv  +  su  =  —  xs  +  fit,  u  €  .A f(A),  v  G  TZ(AT), 
with  fi  =  fi(x,s). 

Subsequent  iterate: 

O^'+l  4.  + 

X  =  XX  ,  .s  =  .S'+ . 

^  =  k  +  1 

UNTIL  convergence. 

Observe  that  our  7  in  the  predictor  step  is  effectively  a  steplength  pa¬ 
rameter.  To  see  this  let  us  denote  the  predictor  step  by  (w°(7),  n°(7))  and 
let  0  =  1  —  7.  Then 


^(u°(0)>  7°(0))  =  (u°(v),  v°(7)) 


and 

(*,«)  =  (x°,a°)  +  «(«°(0),i;o(0)); 

which  is  the  usual  way  of  writing  the  MTY  predictor  step.  The  usual  choice 
for  0  is  0k,  the  largest  0  €  (0, 1]  such  that  6(x(0),s(0))  <  a  for  all  0  <  0  <  0k. 
For  further  detail  see,  for  example,  Section  2  of  Ye,  Giiler,  Tapia  and  Zhang 
[20].  Hence  our  choice  of  7  in  the  predictor  step  is  7  =  1  —  0k,  and  can  be 
viewed  as  the  smallest  7  €  [0, 1)  in  the  sense  just  described. 

From  Proposition  2.1  with  (x,  ,s)  =  (,t°,s°),  7  =  7,  and  fi  =  0  we  see  that 
from  the  predictor  step  we  get  fi(x,s)  =  7/x(x°,s°).  Also,  from  the  same 
proposition  with  (:?:,.§)  =  (x, .s),  7  =  0,  and  fi  =  fi(x,s)  we  see  that  from 
the  corrector  step  we  get  //,(x+,.s+)  =  //,(x, .s).  Hence  we  have  fi(x+,s+)  = 
fl(x,s)  =  7//(x°,s°). 

We  now  list  some  properties  of  this  algorithm.  Some  proofs  are  presented 
here  for  the  sake  of  completeness.  The  proofs  that  are  not  given  here  can 
be  found  in  Mizuno,  Todd,  and  Ye  [14],  Mizuno,  Todd,  and  Ye  proved  that 
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the  algorithm  is  well  defined  in  the  sense  that  the  centering  step  produces 
(x+,5+)  such  that  />(x+,  .s+)  <  a2/\/ 2. 

Bounds  on  the  quantities  appearing  in  the  algorithm  are  given  in  the  lem¬ 
mas  below.  Let  {B,N}  be  the  optimal  partition  for  the  linear  programming 
problem,  i.e.,  the  index  partition  associated  with  the  optimal  face.  As  we 
described  in  Subsection  4.1,  the  central  path  ends  at  the  analytic  center  of 
the  optimal  face,  and  the  pairs  (x,  s)  such  that  ||w(x,s)  —  e||  <  a  consti¬ 
tute  a  neighborhood  of  the  central  path  bounded  away  from  the  non-optimal 
faces  of  the  feasible  polyhedron  and  correspond  to  a  bundle  of  unweighted 
affine-scaling  trajectories.  For  o;  small,  the  bundle  of  trajectories  ends  in  a 
compact  neighborhood  of  the  analytic  center  of  the  optimal  face,  and  so  all 
the  sequences  generated  by  the  algorithm  are  in  compact  sets. 

Hence,  the  algorithm  behaves  as  follows.  As  the  optimal  face  is  ap¬ 
proached  (and  this  happens  in  polynomial  time),  xkN  — >  0  ,  Sg  —*  0  and 
Xg,  .sjy  stay  in  small  neighborhoods  of  .Cg,  s*N,  the  analytic  centers  of  the 
primal  and  dual  optimal  faces. 

Lemma  5.1  Consider  quantities  generated  by  the  MTY  algorithm.  Then 

(i)  XN  =  0(g)  ,  sB  =  O(n)  ,  x%  =  0(g°)  ,  s°B  =  0(g°) 

(ii)  u°  =  0(g°)  ,  v°  =  0(fi°) 

(Hi)  uN  =  0(g)  ,  v B  =  0(g) 


Proof.  All  of  these  bounds  are  implicit  in  the  technical  results  given  in 
Section  3  of  Ye  et  al.  [20].  Specifically  (ii)  follows  from  Lemma  3.2  and 
Theorem  3.1.  The  tools  used  there  can  also  be  used  to  establish  (i)  and  (iii). 
Hence  we  will  not  include  a  proof  and  direct  the  reader  to  that  paper  for 
proofs.  ■ 

The  lemma  above  shows  that  all  the  variations  in  (x,s)  due  to  an  MTY 
step  are  bounded  by  0(g°),  with  exception  of  i/g  and  vjy.  These  are  the 
variations  in  the  large  variables  due  to  the  corrector  step. 

6  Convergence  of  the  MTY  Algorithm 

In  this  section  we  establish  the  main  result  of  the  paper:  the  points  generated 
by  the  MTY  algorithm  always  converge  to  the  analytic  center  of  the  optimal 
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face.  We  shall  assume  that  the  optimal  face  is  not  a  single  point.  Our 
convergence  proofs  will  be  carried  out  for  primal  solutions.  The  symmetric 
results  for  dual  slacks  can  always  be  proved  by  the  same  methods  using  the 
complete  symmetry  of  conditions  (1). 

We  begin  by  studying  the  map  that  results  from  the  algorithm.  Towards 
this  end  we  describe  the  relationship  between  primal-dual  pairs  (.t°,  s°)  and 
the  result  (x+,s+)  of  an  MTY  step  originating  at  (x°,  s°).  It  is  essential  to 
keep  in  mind  that  at  this  point  we  are  not  studying  sequences  generated  by 
the  algorithm.  We  derive  a.  lemma  (a  main  result  of  the  paper)  on  the  bound¬ 
ary  behaviour  of  the  algorithmic  map  for  sequences  with  strong  convergence 
properties;  a  second  lemma  extends  the  result  to  nonconvergent  sequences, 
and  provides  the  main  convergence  property  of  the  algorithmic  map*.  We 
then  consider  a  sequence  generated  by  the  algorithm,  and  prove  in  Theorem 
6.3  that  it  converges  to  the  analytic  center  of  the  optimal  face. 

Consider  a  sequence  of  interior  primal-dual  pairs  (x°  ,s°  ),  and  all  the 
quantities  that  would  be  generated  by  applying  one  MTY  step  from  each  of 

),  //,  /  =  7VC 


these  points,  namely  (u°k,v°k),  (xk,sk),  (uk,vk),  ( x+k,.s+k ^  •• °k  ,,k  —  ^k--°k 


w 


w 


J  ,  <j)k .  Again,  we  stress  the  fact  that  presently  (x0,  s°)*+1  is  not 
necessarily  related  to  (x+,s+)k.  Recall  that  we  are  denoting  the  analytic 
center  by  (x*,s*).  Also  the  {B,N}  partition  of  the  indices  {1, . . .  ,  n}  is  the 
partition  associated  with  the  optimal  face  of  the  linear  program  in  question. 
Our  main  interest  is  in  measuring  how  the  large  variables  approach  x*B.  A 


good  metric  for  measuring  this  is  given  by  the  norm 
To  simplify  notation,  we  write 


,  defined  on  lR)Bti. 


II  -11*  =  11-  Ik- 

Lemma  6.1  Let  (x°k,s°k)  be  such  that  8(x°k ,  s°k)  <  0.1,  and  assume  that 
fi°  — >  0,  (x°k,s°  )  — *  (r,s),  and  u)°k  — >  id0.  We  have  the  following 

(i) :  If  x  =  x* ,  then  uk  —>■  0  and  x+k  —>  x* . 

(ii) :  If  x  f  x* ,  then  for  sufficiently  large  k, 

II XB  ~  XB  II*  —  O-Slkfi  ~  X*B  II*’ 

*The  reader  might  consider  Lemma  6.2  before  going  through  the  technical  proof  of 
Lemma  6.1. 
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Proof.  The  proof  consists  of  two  technical  parts  and  a  conclusion.  In 
the  first  part  we  analyse  the  boundary  behaviour  of  the  MTY  steps;  in  the 
second  part  we  describe  the  centering  direction  from  x  in  the  optimal  face. 
Finally,  the  conclusion  is  reached  from  the  comparison  of  the  results  of  the 
first  two  parts. 

k  k 

We  begin  by  considering  MTY  steps.  From  Lemma  5.1,  (w°  ,  v°  )  — >  0 
and  consequently  (xk,sk)  — >  (x,  .s).  From  the  same  lemma,  ukN  — >■  0.  We 
must  describe  the  behaviour  of  uB.  From  (10), 

uk  =  -X kfkPAX^fk  (jf  -  e)  . 

We  are  now  in  a  position  to  use  Lemma  3.2  with  <1  =  xf  and  p  =  —  0  ^ - e). 

Our  first  task  is  to  show  that  these  two  sequences  converge.  By  hypothesis 
||u;(xofe,  s0*)  —  e ||  <  0.1.  Hence  ||w(x,  s)  —  e||  <  0.1.  It  follows  that  u>(x,  s)  >  0. 
We  observed  that  (xk,  sk)  also  converges  to  (x,.s).  This  means  that  <j)(xk,sk ) 
converges  to  f  =  u(x,s)~2  >  0.  We  have  demonstrated  that  dk  converges 
to  d  =  xf.  Now,  sk  converges  to  .s  and  u>k  =  converges  to  u>  implies 

that  converges  to  sJjCon,  and  hence  pkN  converges.  Since  s*B  =  0  we  see 
that  pB  =  fB-  This  shows  that  both  dk  and  pk  converge.  We  can  now  apply 
Lemma  3.2  to  obtain 

uB  — >■  uB  =  xb4>bPaxb<s>b4>b ■  (27) 

Since  x+k  =  x°k  +  u°k  +  uk  and  u°k  — >  0,  ukN  — >  0, 

+i:  _+ 

x  x  =  x  +  u, 


where  un  =  0. 

Our  attention  now  goes  to  centering  in  the  optimal  face.  Consider  the 
following  primal  centering  direction  associated  with  each  (x°  ,.s°  ): 


hk  = 


(28) 


where  s  is  an  arbitrary  dual  slack  (remember  that  dP^Bd. s  =  dP^Bds'  for 
any  dual  slacks  .s,  s'  and  any  scaling  d  >  0.) 
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With  5  =  s°k ,  we  see  that  hk  =  —x°k  PAXOk(w°k  —  e).  It  follows  that 
hN  =  0  and 

II^IU*  <  IK*  -  ell  =  *(*“*, s0*)  <  o.i. 

We  now  consider  (28)  with  s  =  .s*.  Lemma  3.2  with  d  =  x°  and  p  =  —  +  e 

can  be  used  to  determine  the  behaviour  of  hk  once  we  demonstrate  that  dk 
and  pk  converge.  In  this  case  dk  converges  by  hypothesis.  Moreover,  an 
argument  similar  to  the  one  used  above  will  show  that  pk  converges.  Hence 
Lemma  3.2  applies,  and  so  hk  — »  h.  From  these  latter  two  arguments  we 
have  that 


hN  =  0  ,  hB  =  xBPabxbzb  and  ||/?b||sb  <  0.1. 

We  conclude  that  h  is  the  Newton  centering  direction  in  the  optimal  face, 
and  that  the  proximity  measure  of  x  is 


Hxb)  =  IIMUb  <  o.i. 

Let  y  =  x  +  h  be  the  result  of  a  primal  centering  step.  Then  by  Lemma  4.3, 


\\xb-xb II*  >  0.75S(xb) 
\\vb-x*b\\*  <  1.05S2(xb). 


Our  attention  now  turns  to  shifted  scaling.  We  study  the  effect  of  the 
direction  uB  defined  in  (27),  when  it  is  used  for  primal  centering  instead  of 
h.  The  quantity 

UB  =  XB^bPaXb^s^B 


corresponds  to  hB  by  way  of  a.  shifted  scaling.  Here  (j)  =  1  /t/w,  as  usual. 
Since  ||u)  —  e||  <  0.1,  it  follows  that  for  i  =  1 ,. . .  ,n  wt  £  [0.9, 1.1]  and  it  is 


trivial  to  check  that 
3.3, 


£  [0.9, 1.1].  Hence  U  —  e  <0.1,  and  by  Lemma 


hB  —  uB \\xB  <  0-3\\hB\\xB  =  0.3S(xb). 


(30) 


If  x  =  X'*,  then  S(xB)  =  0  and  it  follows  that  hB  =  uB  =  0.  This  proves  part 
(i)  of  the  lemma,.  Assume  from  here  on  that  ||x£  —  x  fill  7^  0. 

We  need  (30)  in  the  norm  ||  ■  ||*.  Using  (26),  define 


a  = 


\XB 


B 


Us  < 


Kx-b)  <  or 
1  —  h(xB)  ~  0.9 
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Using  Lemma  3.4, 


\\h-B  ~  ub ||*  <  - - \\I>B  —  ub \\xB 

1  —  a 

Merging  this  and  (30)  with  1/(1  —  a)  <  1.2  we  obtain 

\\h-B  ~  '"b||*  <  0.4<*>(xb). 


(31) 


And  now  we  compare  the  points  ijb  —  xb  +  I>b  and  =  x b  +  ub ,  using 
(29).  Specifically 

Pb  -  -'411*  <  II ub  -  411*  +  114  -  vb\\* 

=  \\VB  —  X*b\\*  +  W^B  —  h-B\\* 

<  1.05^2(x-b)  +  0.4<*>(xb) 

<  0.51h(x  B)  • 


Using  (29),  we  conclude  that 


X  D  X  1 


\XB  ~  Xg 


*  — 


<  m  <  o.7. 


0.75 


k  k 

Finally,  we  conclude  from  this  expression  that  since  x°  — >  x  and  x+ 
for  sufficiently  large  A, 


14  _-tbII*  —  0-8|Ixb  -xbIU 


completing  the  proof.  I 

k  k 

The  lemma  above  studies  convergent  sequences  (x°  ,s°  ).  The  next 
lemma  shows  that  the  reduction  in  distance  from  x*  can  be  extended  uni¬ 
formly  for  non  convergent  sequences. 

Lemma  6.2  Let  ( x°k,s°k )  be  such  that  h(x°k ,s°k)  <  0.1  and  p°k  — ►  0.  Then 
there  exists  a  sequence  of  positive  reals  ek  such  that  ek  —*■  0  and  for  sufficiently 
large  k, 

Pb*  -  411*  <  max{efc,0.8||xB  -  Xg||*}. 


Proof.  Assume  by  contradiction  that  there  exists  e  >  0  and  a  subsequence 
of  (x°k,s°k)  with  indices  K°  C  IV  such  that  for  k  £  K°, 


x 


+* 


-  X, 


>  e 


I  T  + 
I J  B 


xbII*  >  0.8||x^ 


x, 


(32) 


The  sequences  (x°k  ,s°k),  (w°k),  ( wk )  are  all  in  compact  sets  by  construction, 

and  thus  there  must  exist  a  subsequence  with  indices  K.  C  such  that  these 

three  sequences  are  convergent  in  K.. 

k 

In  particular,  (xB  )%  does  not  converge  to  x*B,  due  to  (32).  Applying 
Lemma  6. 1  (i) ,  we  see  that  (,r°  does  not  converge  to  x*,  and  thus  (ii)  must 
hold  for  this  subsequence.  This  contradicts  (32),  completing  the  proof.  I 
Finally  we  are  ready  to  establish  our  convegence  result. 


Theorem  6.3  Consider  sequences  (xuk  ,s°k),  (xk,sk)  generated  by  the  MTY 
algorithm.  Then  (x °k,s°k)  —>  (x*,s*)  and  (xk,sk)  — ►  (x*,s*),  where  (.x-*,s*) 
is  the  analytic  center  of  the  solution  set. 


Proof.  We  prove  the  result  for  the  primal  variables.  The  proof  for  the 
dual  slacks  is  similar.  Also,  it  is  enough  to  prove  that  x°  — ►  x* ,  since 
u°k  =  0{n°k)  ->  0. 

k 

Assume  by  contradiction  that  the  sequence  {.x°  }  has  an  accumulation 
point  x  ^  x* .  Since  xj-j  =  x*N  —  0,  we  ha.ve 

a  =  \\xB  -  >  0. 

Let  { ek }  be  the  sequence  guaranteed  by  Lemma.  6.2,  and  let  k  be  such  that 
the  conclusions  of  that  lemma  are  valid  for  k  >  k.  Choose  an  index  j  >  k 
such  that  \\x°BJ  —  <  l.lcr,  and  such  that  for  k  >  j,  tk  <  0.5a.  This 

k 

index  exists  because  ek  — >  0  and  x g  is  an  accumulation  point  of  {xi:B  }. 

k 

We  prove  by  induction  that  for  any  k  >  j .  \\x'f  —  xB\\t  <  0.9 a. 

(a)  ||-X5J+1  —  <  0.8  x  I.lcr  <  0.9(7  by  Lemma  6.2. 

(b)  Assume  that  for  an  index  k  >  j,  \\x°B  —  x*B\\+  <  0.9a.  Then  by  Lemma 
6.2,  \\x°Bk+1  —  x*B\\*  <  max{efc,0.8||.'C5  —  x-g||,}  <  0.9cr. 

k 

(a)  and  (b)  prove  that  for  all  k  >  j ,  \\x°B  —  x*B\\*  <  0.9(7,  contradicting 

k 

the  fact  that  a  is  an  accumulation  point  of  the  sequence  (\\x(f  —  .r:^ |j»),  and 
completing  the  proof.  I 
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